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Abstract 

We address the inverse problem that arises in compressed sensing of a low-rank matrix. Our approach is to pose the inverse 
' problem as an approximation problem with a specified target rank of the solution. A simple search over the target rank then 

provides the minimum rank solution satisfying a prescribed data approximation bound. We propose an atomic decomposition that 
, provides an analogy between parsimonious representations of a sparse vector and a low-rank matrix. Efficient greedy algorithms 

to solve the inverse problem for the vector case are extended to the matrix case through this atomic decomposition. In particular, 
we propose an efficient and guaranteed algorithm named ADMiRA that extends CoSaMP, its analogue for the vector case. The 
performance guarantee is given in terms of the rank-restricted isometry property and bounds both the number of iterations and 
the error in the approximate solution for the general case where the solution is approximately low-rank and the measurements 

■ are noisy. With a sparse measurement operator such as the one arising in the matrix completion problem, the computation in 
ADMiRA is linear in the number of measurements. The numerical experiments for the matrix completion problem show that, 
although the measurement operator in this case does not satisfy the rank-restricted isometry property, ADMiRA is a competitive 
algorithm for matrix completion. 

Index Terms 

Rank minimization, performance guarantee, matrix completion, singular value decomposition, compressed sensing. 

! I. Introduction 

[ Recent studies in compressed sensing have shown that a sparsity prior in the representation of the unknowns can guarantee 

• unique and stable solutions to underdetermined linear systems. The idea has been generalized to the matrix case [I] with the 
[ rank replacing sparsity to define the parsimony of the representation of the unknowns. Compressed sensing of a low-rank 
' matrix addresses the inverse problem of reconstructing an unknown low-rank matrix Xq E C™^" from its linear measurements 
' b = AXq Q via a given linear operator A : C'"^" C^. As in the vector case, the inverse problem is ill -posed in the sense 

that the number of measurements is much smaller than the number of the unknowns. Continuing the analogy with the vector 
[ case, the remarkable fact is that the number of measurements sufficient for unique and stable recovery is roughly on the same 

■ order as the number of degrees of freedom in the unknown low rank matrix. Moreover, under certain conditions, the recovery 
[ can be accomplished by polynomial-time algorithms |2|. 

One method to solve the inverse problem by exploiting the prior that Xq is low-rank is to solve the rank minimization 
' problem PI, to minimize the rank within the affine space defined by b and A: 

min rankiX) 
PI: xgC-^x" 

subject to AX = b. 

• In practice, in the presence of measurement noise or modeling error, a more appropriate measurement model is 6 = AXq + v 
' where the perturbation u has bounded Euclidean norm, \\i'\\2 ^ V- I" this case, the rank minimization problem is written as 

min rank(X) 

PI'; XGC™X" 

subject to \\AX — bW^ ^ rj 

with an ellipsoidal constraint. Indeed, rank minimization has been studied in more general setting where the feasible set 
is not necessarily restricted as either an affine space or an ellipsoid. However, due to the non-convexity of the rank, rank 
minimization is NP-hard even when the feasible set is convex. Fazel, Hindi, and Boyd fS) proposed a convex relaxation of 
the rank minimization problem by introducing a convex surrogate of rank(X), which is known as nuclear norm ||X||^ and 
denotes the sum of all singular values of matrix X. 

Recht, Fazel, and Parrilo ||2l studied rank minimization in the framework of compressed sensing and showed that rank 
minimization for the matrix case is analogous to ^Q-norm (number of nonzero elements) minimization for the vector case. They 
provided an analogy between the two problems and their respective solutions by convex relaxation. In the analogy, ^i-norm 
minimization for the fo-norm minimization problem is analogous to nuclear norm minimization for rank minimization. Both are 

K. Lee and Y. Bresler are with Coordinated Science Laboratory and Department of ECE, University of Illinois at Urbana-Champaign, IL 61801 USA 
e-mail: {kleeS 1 ,ybresler} @illinois.edu 

' Linear operator A is not a matrix in this equation. In this paper, to distinguish general linear operators from matrices, we use calligraphic font for general 
lineal' operators. 
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efficient algoritlims, with guaranteed performance under certain conditions, to solve NP-hard problems: ^o-norm minimization 
and rank minimization, respectively. The respective conditions are given by the sparsity-restricted isometry property ||4| and 
the rank-restricted isometry property 121, HI, respectively. However, whereas i'l-norm minimization corresponds to a linear 
program (or a quadratically constrained linear program for the noisy case), nuclear norm minimization is formulated as a 
convex semidefinite program (SDP). Although there exist polynomial time algorithms to solve SDP, in practice they do not 
scale well to large problems. 

Recently, several authors proposed methods for solving large scale SDP derived from rank minimization. These include 
interior point methods for SDP, projected subgradient methods, and low-rank parametrization |2!| combined with a customized 
interior point method ISJ. These methods can solve larger rank minimization problems, which the general purpose SDP solvers 
cannot. However, the dimension of the problem is still restricted and some of these methods do not guarantee convergence to 
the global minimum. Cai, Candes, and Shen [6] proposed singular value thresholding (SVT), which penalizes the objective of 
nuclear norm minimization by the squared Frobenius norm. The dual of the penalized problem admits a projected subgradient 
method where the updates can be done by computing truncated singular value decompositions. They have shown that the 
solution given by SVT converges to the solution to nuclear norm minimization as the penalty parameter increases. However, 
an analysis of the convergence rate is missing and hence the quality of the solution obtained by this method is not guaranteed. 
Furthermore, the efficiency of SVT is restricted to the noiseless case where the constraint is affine (i.e., linear equality). Ma, 
Goldfarb, and Chen Q proposed a formulation of nuclear norm minimization by using the Bregman divergence that admits an 
efficient fixed point algorithm, which is also based on the singular value decomposition. They did not provide a convergence rate 
analysis and the efficiency is also restricted to the noiseless, affine constraint case. Meka et. al. jS) used multiplicative updates 
and online convex programming to provide an approximate solution to rank minimization. However, their result depends on the 
(unverified) existence of an oracle that provides the solution to the rank minimization problem with a single linear constraint 
in constant time. 

An alternative method to solve the inverse problem of compressed sensing of a matrix is minimum rank approximation, 

min \\AX-b\\^ 
P2: xgC^x" 

subject to rank(X) ^ r, 

where r = rank(Xo) denotes the minimum rank. The advantage of formulation P2 is that it can handle both the noiseless 
case and the noisy case in a single form. It also works for more general case where Xq is not exactly low-rank but admits 
an accurate approximation by a low-rank matrix. When the minimum rank r is unknown, an incremental search over r will 
increase the complexity of the solution by at most factor r. If an upper bound on r is available, then a bisection search over 
r can be used because the minimum of P2 is monotone decreasing in r. Hence the factor reduces to log r. Indeed, this is not 
an issue in many applications where the rank is assumed to be a small constant. 

Recently, several algorithms have been proposed to solve P2. Haider and Diego ||9| proposed an alternating least square 
approach by exploiting the explicit factorization of a rank-r matrix. Their algorithm is computationally efficient but does not 
provide any performance guarantee. Keshavan, Oh, and Montanari fTOI proposed an algorithm based on optimization over 
the Grassmann manifold. Their algorithm first finds a good starting point by an operation called trimming and minimizes the 
objective of P2 using a line search and gradient descent over the Grassmann manifold. They provide a performance guarantee 
only for the matrix completion problem where the linear operator A takes a few entries from Xq. Moreover, the performance 
guarantee is restricted to the noiseless case. 

Minimum rank approximation, or rank-r approximation for the matrix case, is analogous to s-term approximation for the 
vector case. Like rank-?- matrix approximation, s-term vector approximation is a way to find the sparsest solution of an ill-posed 
inverse problem in compressed sensing. For s-term approximation, besides efficient greedy heuristics such as Matching Pursuit 
(MP) ifTTI and Orthogonal Matching Pursuit (OMP) llT2l . there are recent algorithms, which are more efficient than convex 
relaxation and also have performance guarantees. These include Compressive Sampling Matching Pursuit (CoSaMP) llT3l and 
Subspace Pursuit (SP) lfT4l . To date, no such algorithms have been available for the matrix case. 

In this paper, we propose an iterative algorithm for the rank minimization problem, which is a generaUzationH of the CoSaMP 
algorithm to the matrix case. We call this algorithm 'Atomic Decomposition for Minimum Rank Approximation," abbreviated as 
ADMiRA. ADMiRA is computationally efficient in the sense that the core computation consists of least squares and truncated 
singular value decompositions, which are both basic linear algebra problems and admit efficient algorithms. Indeed, ADMiRA 
is the first guaranteed algorithms among those proposed to solve minimum rank approximation 0. Furthermore, ADMiRA 
provides a strong performance guarantee for P2 that covers the general case where Xq is only approximately low-rank and b 
contains noise. The strong performance guarantee of ADMiRA is comparable to that of nuclear norm minimization in HI. In 
the noiseless case, SVT Q may be considered a competitor to ADMiRA. However, for the noisy case, SVT involves more 

- There is another generahzation of CoSaMP, namely model-based CoSaMP 1151 . However, this generalization addresses a completely different and 
unrelated problem: sparse vector approximation subject to a special (e.g., tree) structure. Furthermore, the extensions of CoSaMP to model-based CoSaMP 
and to ADMiRA are independent: neither one follows from the other, and neither one is a special case of the other. 

' ADMiRA Il6l was followed by the algorithm by Keshavan et. al. 1101 . This short version fT6l will be presented at ISIT'09. 
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than the simple singular value thresholding operation. 

Matrix completion is a special case of low-rank matrix approximation from linear measurements where the linear operator 
takes a few random entries of the unknown matrix. It has received considerable attention owing to its important applications 
such as collaborative filtering. However, the linear operator in matrix completion does not satisfy the rank-restricted isometry 
property ifTTl . Therefore, at the present time, ADMiRA does not have a guarantee for matrix completion. None the less, 
empirical performance on matrix completion is better than SVT (for the experiments in this paper). 

The remaining of this paper is organized as follows: The atomic decomposition and the analogy between the greedy algorithm 
for the vector case and the matrix case are introduced in Section The new algorithm ADMiRA and its performance guarantee 
are explained in Section |III] and in Section |IV] respectively. By using the tools in Section |V] the performance guarantees are 
derived in Section |VI] and Section IVIII Implementation issues and the computational complexity are discussed in Section IVIIII 
and numerical results in Section |IX] followed by conclusions. Our exposition of ADMiRA follows the line of Needell and 
Tropp's exposition of CoSaMP [13J, to highlight, on the one hand, the close analogy, and on the other hand the differences 
between the two algorithms and their analysis. Indeed, there exist significant differences between rank-r approximation for the 
matrix case and s-term approximation for the vector case, which are discussed in some detail. 

II. Vector vs Matrix 

A. Preliminaries 

Throughout this paper, we use two vector spaces: the space of column vectors and the space of matrices C™^". For 
C^, the inner product is defined by {x, y)cp = y^x for x,y E where y^ denotes the Hermitian transpose of y, and 
the induced Hilbert-Schmidt norm is the Euclidean or £2-norm given by \\x\\2 = {x,x)cp for x E CP. For C™^", the inner 
product is defined by (X, F)c>"x,i = ix{Y^X) for X^Y £ ^mx"^ ^nd the induced norm is the Frobenius norm given by 

||X||^ = (X,X)c-x„ for X e C"^". 

B. Atomic Decomposition 

Let r denote the set of all nonzero rank-one matrices in C™^". We can refine F so that any two distinct elements are not 
collinear. The resulting subset O is referred to as the set of atoms^ of C™^". Then the set of atomic spaces A of C™^" is 
defined by A = {span(V;) : ij^ G O}. Each subspace 5 e A is one-dimensional and hence is irreducible in the sense that 
S ^ Si + S2 for some 6*1 , ^2 € A implies 5i = ^2 = S. Since O is an uncountably infinite set in a finite dimensional space 
C"'^", the elements in O are not linearly independent. Regardless of the choice of O, A is uniquely determined. Without loss 
of generality, we fix O such that all elements have unit Frobenius norm. 

Given a matrix X G i£_mxn^ representation X = ctji-'j as a linear combination of atoms is referred to as an 
atomic decomposition of X. Since O spans C™^", an atomic decomposition of X exists for all X € C™^". A subset 
^ = {■(/' e O : (-(Aj, '0fc)c"«XTi = Sjk} of unit-norm and pairwise orthogonal atoms in O will be called an orthonormal set of 
atoms. 

Definition 2.1: Let O be a set of atoms of C™^". Given X e C"'^", we define atoms(X) as the smallest set of atoms in 
O that spans X, 

atoms(X) = argmin{|*| : * C O, X £ span(^')} . (1) 

Note that atoms(X) is not unique. 

An orthonormal set atoms(X) C O is given by the singular value decomposition of X. Let X ~ X)™!''^'' '^kUkvjJ denote 
the singular value decomposition of X with singular values in decreasing order While u^vj^ need not be in O, for each k, 
there exists pk G C such that \pk\ = 1 and pkUkvj^ G O. Then an orthonormal set atoms(X) C O is given by 

atoms(X) = {pfeUfeuf 

Remark 2.2: atoms(X) and rank(X) = |atoms(X)| of a matrix X E C"xn are the counterparts of supp(a;) and ||a;||Q = 
|supp(x)| for a vector x E C^, respectively. 

C. Generalized Correlation Maximization 

Recht, Fazel, and Parrilo ||2| showed an analogy between rank minimization PI and ^o-norm minimization. We consider 
instead the rank-r matrix approximation problem P2 and its analogue - the s-term vector approximation problem 

min II At — b\\„ 

P3: xGC" 

subject to \\x\\^ ^ s. 

*The "atom" in this paper is different from Mallat and Zfiang's "atom" 1111 . which is an element in the dictionary, a finite set of vectors. In our terminology, 
an atom is a rank-one matrix, an element in an infinite set of vectors (in the vector space C^xn-j both cases, however, an atom denotes an irreducible 
quantity - a singleton subset, not representable with fewer elements. (Indeed, for each atom tp, the corresponding atomic space span(i/i) is irreducible.) 



4 



In Problem P3, variable x lives in the union of s dimensional subspaces of C", each spanned by s elements in the finite set 
E = {ei, . . . , e„}, the standard basis of C". Thus the union contains all s-sparse vectors in C". Importantly, finitely many (("), 
to be precise) subspaces participate in the union. Therefore, it is not surprising that P3 can be solved exactly by exhaustive 
enumeration, and finite selection algorithms such as CoSaMP are applicable. 

In the rank-r matrix approximation problem P2, the matrix variable X lives in the union of subspaces of C™^", each of 
which is spanned by r atoms in the set O. Indeed, if X G {^rnxn spanned by r atoms in O, then rank(X) ^ r by the 
subadditivity of the rank. Conversely, if rank(X) = r, then X is a linear combination of rank-one matrices and hence there 
exist r atoms that span X. Note that uncountably infinitely many subspaces participate in the union. Therefore, some selection 
rules in the greedy algorithms for ^o-norm minimization and s-term vector approximation do not generalize in a straightforward 
way. None the less, using our formulation of the rank-r matrix approximation problem in terms of an atomic decomposition, 
we extend the analogy between the vector and matrix cases, and propose a way to generalize these selection rules to the rank-r 
matrix approximation problem. 

First, consider the correlation maximization in greedy algorithms for the vector case. Matching Pursuit (MP) ifTTl and 
Orthogonal Matching Pursuit (OMP) ||T21 choose the index k £ {!,..., n} that maximizes the correlation |a^(6 — At)| 
between the fc-th column of A and the residual in each iteration, where x is the solution of the previous iteration. Given 
a set ^, let V^i denote the (orthogonal) projection operator onto the subspace spanned by \1/ in the corresponding embedding 
space. When = {ijj} is a singleton set, V^, will denote V^i. For example, Ve^. denotes the projection operator onto the 
subspace in C" spanned by e^.. From 

\a^{b-Ax)\ = \(A"{b-Ax),ek)c"\ = \\Ve,A" {b - Ax)\\^ , 

it follows that maximizing the correlation implies maximizing the norm of the projection of the image under A^ of the residual 
b — Ax onto the selected one dimensional subspace. 

The following selection rule generalizes the correlation maximization to the matrix case. We maximize the norm of the 
projection over all one-dimensional subspaces spanned by an atom in O: 



max 



{b- AX,Ai^)c^y,^ =max V^,A*ib~AX) 



(2) 



where ^* : ^ (^mxn (jgjjotes the adjoint operator of A. By the Eckart- Young Theorem, the basis of the best subspace is 
obtained from the singular value decomposition of M = A*{b — AX), as ip ~ uiv^ , where ui and vi are the principal left 
and right singular vectors. 

Remark 2.3: Applying the selection rule dU to update X recursively leads to greedy algorithms generalizing MP and OMP 
to rank minimization. 

Next, consider the rule in recent algorithms such as CoSaMP and SP. The selection rule chooses the subset J of {1, . . . , n} 
with I J| = s defined by 

\aj^ {b - Ax)\ ^ \af {b - Ax)\ , yk e J,yj ^ J. (3) 

This is equivalent to maximizing 

J2 |«f -~Ax)\'=Y. - Ax)\\l = \\r[,,y,^,A"ib - Ax)\\l . 

In other words, selection rule (O finds the best subspace spanned by s elements in E that maximizes the norm of the projection 
of M = A^ {b — Ax) onto that s-dimensional subspace. 

The following selection rule generalizes the selection rule (|3]i to the matrix case. We maximize the norm of the projection 
over all subspaces spanned by a subset with at most r atoms in O: 



inax\\\p^A*{b- AX) : l^-l < r| 

*CO UI F ) 



A basis of the best subspace is again obtained from the singular value decomposition of M = A*{b — AX), as ^ = 
where and w^, fc = 1, . . . , r are the r principal left and right singular vectors, respectively and for each k, 
Pk G C satisfies \pk\ = 1 El ■ Note that ^ is an orthonormal set although this is not enforced as an explicit constraint in the 
maximization. 



III. Algorithm 

Algorithm [l] describes ADMiRA. Intuitively, ADMiRA iteratively refines the pair € O x C™^" where * is the set 

of r atoms that spans an approximate solution X to P2. Step |4] finds a set of 2r atoms ^P' that spans a good approximation 
of Xq — X, which corresponds to the information not explained by the solution X in the previous iteration. Here ADMiRA 
assumes that A acts like an isometry on a low -rank matrix Xq — X, which implies that A* A acts like a (scaled) identity operator 

^ Once the best subspace is determined, it is not required to compute the constants Pfc's. 
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Algorithm 1 ADMiRA 



Input: A : C™^" ^ C^, 6 G C^, and target rank r e N 
Output: rank-r solution X to P2 



1*1 s$ 2r 



1 


X ^ 




2 


$ ^ 




3 


while stop criterion 


4 




are max \ 






*co LI 


5 




U $ 


6 




argniin||| 








7 


$ ^ 


argmaxlll 






*co L II 


8 




v^x 


9 


end while 


10 


return 


X 



h-AX\\^ : X € span(*)} 
? : 1*1 s^rl 



on Xq — X. Under this assumption, the 2r leading principal components of the proxy matrix A*{b — AX) = A*A{Xq — X) 
are a good choice for The quality of a linear approximation of Xq — X spanned by 'f' improves as iteration goes. This 
will be quantitatively analyzed in the proof of the performance guarantee. If * and *' span good approximations of X and 
Xo — X, respectively, then * = will span a good approximation of Xq. Steps |6] and |7] refine the set * into a set of 

r atoms. We first compute a rank-3r approximate solution X and then take its best rank-r approximation to get a feasible 
solution X with rank r. In the process, the set * of 3r atoms is also trimmed to the r atom set * so that it can span an 
approximate solution X closer to Xq. 

ADMiRA is guaranteed to converge to the global optimum in at most 6(r + 1) iterations when the assumptions of ADMiRA 
in Section |IV] are satisfied. However, similarly to the vector case |fT3l , it is more difficult to verify the satisfiability of the 
assumptions than solve the recovery problem itself, and to date there is no known algorithm to perform this verification. Instead 
of relying on the theoretical bound on the number of iterations, we use an empirical stopping criterion below. If either the 
monotone decrease of — ^X||2/ \\b\\2 is broken or AX\\2/ \\b\\2 f^Hs a given threshold, ADMiRA stops. 

In terms of computation. Steps |4] and |7] involve finding a best rank-2r or rank-r approximation to a given matrix (e.g., by 
truncating the SVD), while Step |6] involves the solution of a linear least-squares problem - all standard numerical linear algebra 
problems. Step |5] merges two given sets of atoms in O by taking their union. As described in more detail in Section [Villi these 
computations can be further simplified and their cost reduced by storing and operating on the low rank matrices in factored 
form, and taking advantage of special structure of the measurement operator A, such as sparsity. 

Most steps of ADMiRA are similar to those of CoSaMP except Step |4] and Step |7] The common feasible set O of the 
maximization problems in Step |4] and Step |7] is infinite and not orthogonal, whereas the analogous set E in CoSaMP is finite 
and orthonormal. As a result, the maximization problems over the infinite set O in ADMiRA are more difficult than those in 
the analogous steps of CoSaMP, which can be simply solved by selecting the coordinates with the largest magnitudes. None 
the less, singular value decomposition can solve the maximization problems over the infinite set efficiently. 

IV. Main Results: Performance Guarantee 

A. Rank-Restricted Isometry Property (R-RIP) 

Recht et al ^ generaUzed the sparsity-restricted isometry property (RIP) defined for sparse vectors to low rank matrices. 
They also demonstrated "nearly isometric families" satisfying this R-RIP (with overwhelming probability). These include 
random linear operators generated from i.i.d. Gaussian, or i.i.d. symmetric Bernoulli distributions. In order to draw the analogy 
with known results in £o-norm minimization, we slightly modify their definition by squaring the norm in the inequality. Given a 
linear operator A : C"'^" C^, the rank-restricted isometry constant 6,- (A) is defined as the minimum constant that satisfies 

(1 - dr{A)) \\X\\l ^ hAXWl ^{1 + SriA)) \\X\\l , (4) 

for all X G c™^" with rank(X) ^ r for some constant 7 > 0. Throughout this paper, we assume that the linear operator A 
is scaled appropriately so that 7 = 1 in (|4|i . If ^ has a small rank-restricted isometry constant dr{A) ^ 1, then implies 
that A acts like an isometry (scaled by 7) on the matrices whose rank is equal to or less than r. In this case, A is called a 
rank-restricted isometry to indicate that the domain where A is nearly an isometry is restricted to the set of low-rank matrices. 

' If 7 7^ 1, then the noise term in (6) needs to be scaled accordingly. 
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B. Performance Guarantee 

Subject to the R-RIP, the Atomic Decomposition for Minimum Rank Approximation Algorithm (ADMiRA) has a performance 
guarantee analogous to that of CoSaMR 

The followings are the assumptions in ADMiRA: 
Al: The target rank is fixed as r. 
A2: The linear operator A satisfies (54r(.4) ^ 0.04. 
A3: The measurement is obtained by 

b = AXo + V, (5) 

where v is the discrepancy between the measurement and the linear model AXq. No assumptions are made about 

the matrix underlying the measurement, and it can be arbitrary. 
Assumption A2 plays a key role in deriving the performance guarantee of ADMiRA: it enforces the rank-restricted isometry 
property of the linear operator A. Although the verification of the satisfiabiUty of A2 is as difficult as or more difficult than the 
recovery problem itself, as mentioned above, nearly isometric families that satisfy the condition in A2 have been demonstrated 

na. 

The performance guarantees are specified in terms of a measure of inherent approximation error, termed unrecoverable 
energy defined by 

e = llXo - Xo,.i|^ + 4= 11^0 - Xo,.||, + ||i^||2 , (6) 

where Xo.r denotes the best rank-r approximation of Xq. The first two terms in e define a metric of the minimum distance 
between the "true" matrix Xq and a rank-?- matrix. This is analogous to the notion of a measure of compressibility of a vector 
in sparse vector approximation. By the Eckart-Young-Mirsky theorem ifTSll . no rank-r matrix can come closer to Xo in this 
metric. In particular, the optimal solution to P2 cannot come closer to Xq in this metric. The third term is the norm of the 
measurement noise, which must also limit the accuracy of the approximation provided by a solution to P2. 

Theorem 4.1: Let X^ denote the estimate of Xq in the fc-th iteration of ADMiRA. For each /c ^ 0, satisfies the following 
recursion: 

||Xo - Xk+i\\F ^ 0.5||Xo - M\p + 8e, 
where e is the unrecoverable energy. From the above relation, it follows that 

11^0 -Xk\\F^ 2-* llXoll^ + 16e, Vfc ^ 0. 



Theorem |4JJ shows the geometric convergence of ADMiRA. In fact, convergence in a finite number of steps can be achieved 
as stated by the following theorem. 

Theorem 4.2: After at most 6(r + 1) iterations, ADMiRA provides a rank-r approximation X of X^, which satisfies 

\\X^-X\\f 17e, 

where e is the unrecoverable energy. 

Depending on the spectral properties of the matrix Xq, even faster convergence is possible (See Section [VTll for details). 



C. Relationship between PI, P2, and ADMiRA 

The approximation X given by ADMiRA is a solution to P2. When there is no noise in the measurement, i.e., b — AXq, 
where Xq is the solution to PI, Theorem 14. 1 1 states that if the ADMiRA assumptions are satisfied with r ^ rank(Xo), then 
X = Xq. An appropriate value can be assigned to r by an incremental search over r. 

For the noisy measurement case, the linear constraint in PI is replaced by a quadratic constraint and the rank minimization 
problem is written as: 

min Tank(X) 
Pl': jfec^x" 

subject to \\AX — bW^ ^ rj. 

Let X' denote a minimizer to PI'. In this case, the approximation X produced by ADMiRA is not necessarily equivalent to 
X', but by Theorem 14. 11 the distance between the two is bounded by \\X' — X\\p ^ 17rj for all r ^ rank(X') that satisfies 
the ADMiRA assumptions. 



V. Properties of the Rank-Restricted Isometry 

We introduce and prove a number of properties of the rank-restricted isometry. These properties serve as key tools for 
proving the performance guarantees for ADMiRA in this paper These properties further extend the analogy between the sparse 
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vector and the low-rank matrix approximation problems (P3 and P2, respectively), and are therefore also of interest in their 
own right. The proofs are contained in the Appendix. 

Proposition 5.1: The rank-restricted isometry constant Sr{A) is nondecreasing in r. 

An operator satisfying the R-RIP satisfies, as a consequence, a number of other properties when composed with other linear 
operators defined by the atomic decomposition. 

Definition 5.2: Given a set * = {ipi, . . . , C C"^", define a linear operator : Cl*' ^ C™^" by 

1*1 

£*a = ^afcVfc, VaeC'*l. (7) 

k=l 

It follows from (|7]i that the adjoint operator £^ : C™^" C'*' is given by 

(ax),. = (X,Vjfe)c"X", Vfc = l,...,|M'|, VXgC"X". (8) 

Note that for A : C™^" —> the operator composition AC^: : C'*' —> admits a matrix representation. Its pseudo-inverse 
is denoted by 

Remark 5.3: If 5' is an orthonormal set, then is an isometry and Vqi = CqiC^. If is a set of atoms in O, then 
rank(£vta) |*| for all a G C'*!. 

Proposition 5.4: Suppose that linear operator A : C™^" has the rank-restricted isometry constant 6r{A). Let ^' be 

a set of atoms in O such that |vl>| ^ r. Then 

\\V^A*b\\p < ^l + 5r{A)\\b\\^, ybeC. (9) 



Proposition 5.5: Suppose that linear operator A : C'"^" — » has the rank-restricted isometry constant 5r{A). Let ^ be 
a set of atoms in O such that |^'| ^ r and let X e C'"^" satisfy rank(X) ^ r. Then 

\\r^A*AX\\p ^ (1 + SriA)) \\X\\p . (10) 



Proposition 5.6: Suppose that linear operator A : C™^" has the rank-restricted isometry constant 5r{A). Let vj/ be 

a set of atoms in O such that Ivpl ^ r and let V : C™^" (j^mx?i ^ projection operator that commutes with T'lp. Then 

{l-Sr{A))\\VV^X\\p ii\\VV^A*AVV^X\\p, VX e C"^". (11) 



The following rank-restricted orthogonality property for the matrix case is analogous to the sparsity-restricted orthogonality 
property for the vector case (Lemma 2.1 in H). 

Proposition 5.7: Suppose that linear operator A : C™^" has the rank-restricted isometry constant 5r{A). Let X, F e 

C™^" satisfy (X, F)c™x„ = and rank(X + aF) ^ r for all a S C. Then 

\{AX,AY)cp\ ^ V25r{A)\\X\\p\\Y\\p. (12) 



Remark 5.8: For the vector case, the representation of a vector .t G C" in terms of the standard basis {ej}^^^ of C" 
determines ||x||q. Let Ji,J2 C {!,..., n} be arbitrary. Then the following properties hold: (i) the projection operators 
'P{ej}jeJi ^^'^ '^{ej}jei^ Commute; and (ii) 'Pfe }-^, ^ is s-sparse (or sparser) if x is s-sparse. These properties follow 
from the orthogonality of the standard basis. Proposition 3.2 in 113]. corresponding in the vector case to our Proposition 15.71 
requires these two properties. However, the analogues of properties (i) and (ii) do not hold for the matrix case. Indeed, for 
^1, ^'2 C O, the projection operators V^sj^ and V^i^ do not commute in general and rank(P*X) can be greater than r even 
though rank(X) ^ r. Proposition 15.71 is a stronger version of the corresponding result (Proposition 3.2 in flJl ) for the vector 
case in the sense that it requires a weaker condition (orthogonality between two low-rank matrices), which can be satisfied 
without the analogues of properties (i) and (ii). 

Corollary 5.9: Suppose that linear operator A : C™^" has the rank-restricted isometry constant 6r{A). If sets vJ/^T 

of atoms in O and matrix X e C"'""" satisfy V^V^ =V<i,V^, V^X = {], and 1*1 ^ r, then 

\\V^Vii,A*AVii,X\\p s; V25r(A) \\V^X\\p . (13) 



Remark 5.10: For the real matrix case. Proposition 15.71 can be improved by dropping the constant \/2- This improvement is 
achieved by replacing the parallelogram identity in the proof to the version for the real scalar field case. This argument also 
applies to Corollarv 15.91 
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Finally, we relate the R-RIP to the nuclear norm, extending the analogous result |fT3l from the /--sparse vector case to the 
rank-r matrix case. 

Proposition 5.11: If a linear map A : C™^" satisfies 



\\AX\\l^{l + 5M))\\xf 



for all X e C"""" with rank(X) r, then 



\AX\\^ v/1 + 5r{A) 



F ' 



(14) 



(15) 



for all X e C" 



VI. Proof of Theorem |4.1| 

A. Exactly Low Rank Matrix Case 

Theorem 6.1: Assume rank(Xo) ^ r in Q. Let Xk denote the estimate of Xo in the fc-th iteration of ADMiRA. Then for 
each fc ^ 0, Xk satisfies the following recursion: 



1^0 - ^fe+i||F < O.SllXo - Xk\\F + 6.5 lli^lL . 



From the above relation, it follows that 



1^0 - XuWf < 2-^ \\X4p + 13 ||i^||2 , Vfc ^ 0. 



Theorem 16.11 is proved by applying a sequence of lemmata. We generaUze the proof of the performance guarantee for 
CoSaMP ifTsl to the matrix case by applying the generalized analogy proposed in this paper. The flow and the techniques 
used in the proofs are similar to those in fT3l. However, in the matrix case, there are additional unknowns in the form of the 
singular vectors. Therefore, the generalization of the proofs in [13ij to the matrix case is not straightforward and the proofs 
are sufficiently different from those for the vector case to warrant detailed exposition. The main steps in the derivation of the 
performance guarantee are stated in this section and the detailed proofs are in the Appendix. 

For the proof, we study the {k + l)-th iteration starting with the previous result in the fc-th iteration. Let Xq denote the true 
solution with rank r. Matrix X denotes Xk, which is the estimate of Xq in the fc-th (previous) iteration. Set is the set of 
orthogonal atoms obtained in the previous iteration. From {b — AX), we compute the proxy matrix A*{h — AX). Set vp' is 
the solution of the following low rank approximation problem: 



Lemma 6.2: Let rank(Xo 



= argmax <^ 
^ r in (O. Then 



V^A*{h~AX) 



* c 



I* 



I <2r} 



V^,{X^-X] 



^Q.2A\\Xo- X\\F + 2.m\v\\ 



Lemma W2\ shows that subject to the rank-restricted isometry property, the set 4*' of atoms chosen in Step |4] of ADMiRA 
is a good set: it captures 94%(= 1 — 0.24^) of the energy of the atoms in Xq that were not captured by X, and the effects 
of additive measurement noise are bounded by a small constant. In other words, the algorithm is guaranteed to make good 
progress in this step. 

Lemma 6.3: Let Xo,X e C™""" and let be sets of atoms in O such that |*'| ^ 2r, |$| < r, and V~X = 0. Let 

$ = Then 



< 



V^,{Xo^X] 



Lemma 16.31 shows that the augmented set of atoms produced in Step |5] of the algorithm is at least as good in explaining 
the unknown Xo as was the set 4'' in explaining Jhe part of Xq not captured by the estimate X from the previous iteration. 
Lemma 6.4: Let rank(Xo) ^ r in (|5]l and let ^ be a set of atoms in O with \'^\ ^ 3r. Then 



X = are min ■ 

X 



\h - AX\\^ : X e span 



(*)} 



(16) 



satisfies 



\Xq^X\\f 1.04 



1.021 



2 • 



Lemma l6?4l shows that the least-squares step. Step |6] of the algorithm, performs almost as well as one could do with operator 
A equal to an identity operator: because X is restricted to span(\I'), it is impossible to recover components of Xq in . 
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Hence, the first constant cannot be smaller than 1 . A value of 1 for the second constant, the noise gain, would correspond to 
a perfectly conditioned system. 

Lemma 6.5: Let rank(Xo) ^ r in (|5]l and let denote the best rank-r approximation of X, i.e.. 



Xr 



Then 



rmmi^WX - X\\f ■■ rank(X)5;r|. 
Xq — Xrllr =^ 2||Xo — X\\f- 



As expected, reducing the rank of the estimate X from 3r to r, to produce X^, increases the approximation error However, 
Lemma |63] shows that this increase is moderate - by no more than a factor of 2. 

The update Xk+i = Xr completes the (fc + l)-th iteration. Combining all the results in the lemmata provides the proof of 
Theorem 16.11 

Proof: (Theorem lOll 



\Xo — Xk+l\\F = \\Xo — Xr\\F 

s$ 2\\Xo-X\\f (Lemma |63]l 



^ 2-1.04 



1.02\M\ 



(Lemma 16.41 ) 



2.08 V^,{Xo-Xk) +2.04||i/||2 (LemmajO]) 

F 

< 2.08 • (0.24||Xo - X^Wf + 2.13 + 2.04 (Lemma lOl 

0.b\\Xo-Xk\\F + 6-5M2. 

The recursion together with the fact that X]j=o ^ ^ Sjlo 2^-' = 2 provide the final result. ■ 
B. General Matrix Case 

Theorem 14.11 is proved by combining Theorem 16.11 and the following lemma, which shows how to convert the mismodeling 
error (deviations of Xq from a low rank matrix) to an equivalent additive measurement noise with a quantified norm. 

Lemma 6.6: Let Xq be an arbitrary matrix in C'"^". The measurement b = AXq + i' is also represented as 6 = AXQ,r + v 
where 



^ 1.02 



O.rll F 



1 



\Xq - X 



O.rli 



Proof: Let v = A{Xq - Xo,^) + v- Then b = AXQ,r + v. 

\\A{Xo-Xo^r)\\^ + M^ 
^l + 5r{A) |lXo-Xo,.| 



Xq — Xo^rl 



where the last inequality holds by Proposition 15.111 The inequality ^ 5ir{^) ^ 0.04 implies ^J\ + 5r {A) ^ 1.02. ■ 

Proof: (Theorem 14.1) Let X be an arbitrary matrix in C™^". The measurement is given by 6 = AX^^r + i^, where v is 
defined in Lemma 16.61 By Theorem 16.11 



||Xo,. - Xu+i\\f ^ 0.5||Xo,. -Xu\\f + 6.5 . 
Applying the triangle inequality and the above inequality, 

i|Xo — X/j+iIIf ^ llXo^-r — Xfe-|_i||i? + ||Xo — Xo,r|l 



^ 0.5||Xo,,-Xfc||f. + 6.5||i?|| 



Xn — X( 



0,r| 



Using the upper bound on \\v\\2 yields 

llXo-Xfc+illj. ^ 0.5|lXo-Xfe||;- + 7.63||Xo-Xo,,|l^ 

< 0.5|lXo-Xfc|lj. + 8e, 
where e is the unrecoverable energy. 



6.63 



Xo-Xo,,,. 11 +6.5111^11 
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VII. Required Number of Iterations 

Theorem 14.21 provides a uniform bound on the number of iterations required to achieve the guaranteed approximation 
accuracy. In addition to this uniform iteration bound. Theorem 17.61 in this section shows that even faster convergence may be 
expected for matrices X with clustered singular values. 

In the analysis of the iteration number, the distribution of the singular values of the matrices involved is the only thing 
that matters. Indeed, the singular vectors do not play any role in the analysis. As a consequence, the proofs for the vector 
case (CoSaMP) and the matrix case (ADMiRA) are very similar, and the corresponding bounds on the number of iterations 
coincide. However, for the completeness, we provide the proofs for the matrix case. 

Definition 7.1: Given X e C"»xn^ atoms(X) is defined in ([T]i. We define the atomic bands of X by 

Bj ^ {V e atoms(X) : 2'^^+^^ \\xfp < \\V4,xfp s$ 2"^' ||X||^}, for j e Z+, 

where Z+ denotes the set of nonnegative integers. Note that atomic bands are disjoint subsets of atoms(X), which is an 
orthonormal set of atoms in O, and therefore atomic bands are mutually orthogonal. From the atomic bands, the profile of X 
is defined as the number of nonempty atomic bands, i.e., 

profile(X) ^ |{j : ^ 0}|. (17) 

From the definition, profile(X) ^ rank(X). 

The atomic bands and profile(X) admit a simple interpretation in terms of the spectrum of X. Let atoms(X) = {V'fc}™^''^'' 
be ordered as HT^i/.^-'^ ||p ^ HP^j^^i^^Xjlj^. Then ll'Ptfc-^H^ = (Jk, where ak is the fc-th singular value of X, in decreasing 
order. Let 



\x\\V 



Then Bj = {ipk : — {j + 1) ^ log2 ct| ^ ^j}- In other words, Bj contains the atoms(X) corresponding to normalized 
singular values falling in a one octave interval ("bin"). The quantity profile(X) then is the number of such occupied octave 
bins, and measures the spread of singular values of X on a log scale. 

Remark 7.2: For the vector case, the term analogous to the atomic band is the component band |fT9ll defined by 

^ {fee {!,..., n}: 2~^^+^U\x\\l<\xk\^ ^2-^\\x\\l}, forjeZ+, 

for x e C". 

First, the number of iterations for the exactly low-rank case is bounded by the following theorem. 

Theorem 7.3: Let Xq G C"ix" be a rank-r matrix and let t ~ profile(Xo), where profile(Xo) is defined in ( fTTb . Then after 
at most 

Hog4/3(l + 4.3V^) + 6 
iterations, the estimate X produced by ADMiRA satisfies 



We introduce additional notations for the proof of Theorem 17. 3 1 Let X^ denote the estimate at the fc-th iteration of ADMiRA. 
For a nonnegative integer j, we define an auxiliary matrix 

Yj^Y.'^B.Xo. 

Then Yj satisfies 

\\Y,\\l^Y.'^-''\\^o\\l-\Bk\. (18) 

The proof of Theorem 17. 3 1 is done by a sequence of lemmata. The first lemma presents two possibilities in each iteration of 
ADMiRA: if the iteration is successful, the approximation error is small; otherwise, the approximation error is dominated by 
the un-identified portion of the matrix and the approximation error in the next iteration decreases by a constant ratio. 

Lemma 7.4: Let rank(Xo) ^ r in (|5]l. Matrix Xk denotes the estimate of Xq in the fc-th iteration of ADMiRA. Let ^E'fc 
denote atoms(Xfe). In each iteration of ADMiRA, at least one of the followings holds: either 



\\Xo~Xk\\pi^7Q\\v\\^, 



(19) 
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or 

\\Xo-Xk\\F ^ 2.15 
\\Xo-Xk+i\\F ^ (^^y\Xo-Xk\\F. (21) 

Lemma 7.5: Fix K = [t log4/3(l+4.3-\/7'/t)J . Assume that ( l20b and ( 1211 1 are in force for each iteration. Then atoms(Xif ) = 
atoms(Xo). 

Next, the result is extended to the approximately low-rank case by using, once again. Lemma 16.61 
Theorem 7.6: Let Xq G C'"^" be an arbitrary matrix and let t = profilc(Xo.r)- Then, after at most 

Hog4/3(l + 4.3y^) + 6 
iterations, the estimate X produced by ADMiRA satisfies 

\\Xo-X\\f 17e, 

where e is the unrecoverable energy. 

Proof: (Theorem 14.21 ) As a function of t, {t log4y3(l + A.Sy^r/t) + 6) is maximized when t = r. Since log4/3 5.6 < 6, the 
number of iterations is at most 6(r+ 1). Therefore, the approximation error of ADMiRA is achieved within 6(r + 1) iterations 
for any matrix Xq. ■ 
Theorem 17.61 is also of independent interest, because the bound it provides reveals that even faster convergence can be 
achieved for matrices X with small profile(Xo,r) <C r. Recall the relationship between profilc(Xo.r) and the distribution of 
the r largest singular values of Xq- It follows that the number of iterations in ADMiRA required for convergence is roughly 
proportional to the number of clusters of singular values of Xo,r on a log scale. 



V~ X 



(20) 



VIII. Implementation and Scalability 

We analyze the computational complexity of ADMiRA and will show that ADMiRA scales well to large problem instances. 
Each iteration of ADMiRA consists of procedures requiring the following basic operations: application of A and A*, singular 
value decompositions, and solving a least square problem. We analyze the computational cost of the procedures in terms of 
the complexity of the basic operations, which will depend on the properties of A. First note that ADMiRA keeps the matrix 
variables (except the proxy matrix) in factorized form through their atomic decomposition, which is advantageous for both the 
computational efficiency and memory requirements. Furthermore, the proxy matrix is often sparse in applications such as the 
matrix completion problem. 

Computing the proxy matrix: this involves the application of A and A* . The procedure first computes the residual y = b — AX 
and then computes the proxy matrix A*y. Let X = '^kUkvj^ denote the atomic decomposition of X. Here u^v^'s are 

not necessarily orthogonal. {AX)k can be computed by {X, Zk)cmxn. = X]I=i '^kvj^ Z^Uk, = 1, . . . ,p, for an appropriate 
set of p matrices Zk G C™^". Then A*y can be computed by VkZk- The complexity of these operations will depend 

on the sparsity of A. 

Case 1: is an arbitrary linear (dense) operator and the costs of computing AX and A*y are 0{prmn) and 0{pmn), 
respectively. ^ 

Case 2: ^ is a sparse linear operator - so the have 0{ra + n) non-zero elements, and and the costs of computing AX 
and A*y are 0{pr{m + n)) and 0{p{m + n)), respectively. 

Case 3: ^ is an extremely sparse linear operator (such as in the matrix completion problem), so the Z^ have 0(1) nonzeros, 
and the costs of computing AX and A*y are 0{pr) and 0{p), respectively. 

Finding the 2r principal atoms of the proxy matrix: this involves the truncated singular value decomposition with 2r dominant 
singular triplets, which can be computed by the Lanczos method at a cost of 0{mnrL), where L denotes the number of the 
Lanczos iterations per each singular value, which depends on the singular value distribution. An alternative approach is to 
use recent advances in low rank approximation of large matrices based on randomized algorithms (c.f. 1201 . ||2T1 . and the 
references therein.) that compute the low-rank approximation of a given matrix in time linear in the size of the matrix. These 
randomized algorithms are useful when the size of the matrix is large but the rank r remains a small constant. For example, 
the complexity of Har-Peled's algorithm 1201 is 0{mnr^ logr). When A is sparse with 0(1) nonzero elements per each Zk, 
the matrix -vector product {A*y)w for u; S can be computed as X]fc=i VkZkW and hence the complexity reduces to 0{prL) 
for the Lanczos method and 0{pr^ logr) for the randomized method, respectively. 

Solving least square problems: ADMiRA requires the solution of an over-determined system with p equations and 3r 
unknowns. The complexity is 0{pr^). Similarly to CoSaMP, the Richardson iteration or the conjugate gradient method can be 
used to improve the complexity of this part. The convergence of the Richardson iteration is guaranteed owing to the R-RIP 
assumption of ADMiRA and the complexity is 0{pr). 
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Finding the r principal atoms of the solution to the least square problem: this also involves the truncated singular value 
decomposition of the least square solution X. In fact, this procedure can be done more efficiently by exploiting the fact that 
X is available in a factorized form X ~ UYy^ where U E C"x3r y g (^nx3r^ E is a 3r x 3r diagonal matrix. Here 
U, V do not consist of orthogonal columns in general. Let U = QijRu and V = QyRv denote the QR factorizations of U 
and V, respectively. Then Q§Qu = Im and QyQv — In- Now let WDZ^ denote the singular value decomposition of the 
3r X 3r matrix RijYiRy . Then we have the desired singular value decomposition X ~ {QuW)D{QvZ)^ . The complexity 
is 0((m + n + 7')r^), which is negligible compared to a direct SVD of X. 

Applications of A and A* are the most demanding procedures of ADMiRA for a dense linear operator A. These operations are 
also required in all other algorithms for PI, PI', or P2. To overcome this computational complexity, the linear operator A should 
have some structure that admits efficient computation. Examples include random Toeplitz matrices and randomly subsampled 
Fourier measurements. For matrix completion, ^ is a sparse with 0(1) cost per measurement and hence these operations are 
dominated by the remaining operations. In this case, the computation of the truncated singular value decomposition is the 
most demanding procedure of ADMiRA. Equipped with the randomized low rank approximation, ADMiRA has complexity of 
0{pr^ logr) per iteration, or 0{pr^ log?') to achieve the guarantee in Theorem 14.21 ADMiRA therefore has complexity linear 
in the size p of the data, and it scales well to large problems. 

IX. Numerical Experiment 

We tested the performance of ADMiRA with an operator A generated by a Gaussian ensemble, which satisfies RIP with high 
probability. ADMiRA performed well in this case as predicted by our theory. Here we study reconstructions by ADMiRA with a 
generic matrix completion example. Note that the performance guarantee in terms of R-RIP does not applies to this case, because 
the linear operator in the matrix completion problem does not satisfy the RIP. None the less, we want to check the empirical 
performance of ADMiRA in this practically important application. Our Matlab implementation uses PROPACK 1221 (an 
implementation of the Lanczos algorithm) to compute partial SVDs in Steps |4] and |7] of ADMiRA. The test matrix Xq G M"^" 
is generated as the product Xq — Y^Y^ where Yl^Yj^ G M"^'' have entries following an i.i.d. Gaussian distribution. The 
measurement 6 is p randomly chosen entries of X, which may be contaminated with an additive white Gaussian noise. The 
reconstruction error and measurement noise level are measured in terms of SNRiccon — 20 log]^Q(|jXo|j^ / II -'^o — -''^11 f) and 
SNRinoas = 20 log]^Q(||fe||2 / ||j^|l2)' respectively. Computational efficiency is measured by the number of iterations. Here we 
stopped the algorithm when ||6 — ^X||2/ ||6||, < lO^**. As a result, the algorithm provided SNRiccon around 70dB for the 
ideal (noiseless and exactly low-rank) case when it was successful. However, it is still possible to get higher SNRrccon with 
a few more iterations. The results in Fig. [1] Table U and Table HIl have been averaged over 20 trials. 

Fig. [T] shows that both SNRiccon and the number of iterations improve as p/d,. increases. Here dr is the number of degrees 
of freedom in a real rank-?- matrix defined by dj. ~ r{n + m — r) and denotes the essential number of unknowns. Fig. [T] 
suggests that we need p/dr ^ 20 for n = 500. 

Candes and Recht fTTl showed that p = 0(n^ -^r logj^Q n) known entries suffice to complete an unknown n x n rank- 
r matrix. Table |T] shows that ADMiRA provides nearly perfect recovery of random matrices from p known entries where 
p = lOfn^-^r logj^Q n] . Although SNRrccon in the noiseless measurement case is high enough to say that the completion is 
nearly perfect, the number of iterations increases as n increases. We are studying whether this increase in iterations with n 
might be an artifact of our numerical implementation of ADMiRA. In the noisy measurement case the number of iterations is 
low and does not increase with problem size n. Because in most if not all practical applications the data will be noisy, or the 
matrix to be recovered only approximately low rank, this low and constant number of iterations is of practical significance. 

Table shows that in most of the examples tested, ADMiRA provides slightly better performance with less computation 
than SVT Q. Roughly, the computational complexity of a single iteration of ADMiRA can be compared to two times that of 
SVT. 

Fig. [2] compares the phase transitions of ADMiRA and SVT. We count the number of successful matrix completions 
(SNRiccon ^ 70dB) out of 10 trials for each triplet {n,p, r). Brighter color implies more success. ADMiRA performed better 
than SVT for this example. 

We emphasize that all comparisons with SVT were performed for the noiseless exactly low rank matrix case, because the 
current implementation 1231 and theory l6l of SVT do not support the ellipsoidal constraint case. We are not aware of an 
efficient, scalable algorithm other than ADMiRA that supports the ellipsoidal constraint. 

X. Conclusion 

We proposed a new algorithm, ADMiRA, which extends both the efficiency and the performance guarantee of the CoSaMP 
algorithm for ^o-norm minimization to matrix rank minimization. The proposed generalized correlation maximization can be 
also applied to MP, OMP, and SP and their variants to similarly extend the known algorithms and theory from the s-term 
vector approximation problem to the rank-r matrix approximation. ADMiRA can handle large scale rank minimization problems 
efficiently by using recent linear time algorithms for low rank approximation of a known matrix.. Our numerical experiments 
demonstrate that ADMiRA is an effective algorithm even when the R-RIP is not satisfied, as in the matrix completion problem. 
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p/dr P/dr 



Fig. 1. Completion of random matrices by ADMiRA: n = m = 500, r = 2. 



n 


•pjr? 


p/dr 


no noise 


SNR^eas = 20dB 


SNR,,eon (dB) 


#iter 


SNRrecon (dB) 


#iter 


500 


0.37 


47 


83 


8 


34 


5 


1000 


0.24 


60 


83 


9 


34 


5 


1500 


0.18 


69 


82 


11 


35 


5 


2000 


0.15 


76 


81 


12 


35 


5 


2500 


0.13 


81 


81 


18 


36 


5 


3000 


0.12 


86 


81 


24 


36 


5 


3500 


0.10 


90 


81 


26 


36 


5 


4000 


0.09 


95 


80 


32 


36 


5 


4500 


0.09 


98 


81 


37 


36 


5 



TABLE I 

Completion of random matrices by ADMiRA: n = m,r = 2,p = lOfn^-^r logj^Q n]. 



While the performance guarantee in this paper relies on the R-RIP, it seems that a performance guarantee for ADMiRA without 
using the R-RIP might be possible. 

Appendix 

A. Proof of Proposition 15.71 

The rank-restricted isometry constant 5r{A) can be represented as 

5r{A) = max{[a,,niax(^)]^ - 1, 1 - Kmi„(-4)]2}, (22) 




Fig. 2. Phase transition of matrix completion: n = m = 100. 
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r 


/ 2 

Pin 


/ 7 

p/dr 


SNR,eco„ (dB) 


#iter 


ADMiRA 


SVT 


ADMiRA 


SVT 


2 


0.05 


12.51 


77 


74 


259 


143 


0.10 


25.03 


79 


77 


56 


77 


0.15 


37.54 


81 


78 


20 


61 


0.20 


50.05 


82 


79 


11 


54 


0.25 


62.56 


84 


79 


8 


49 


0.30 


75.08 


84 


79 


7 


46 


5 


0.05 


5.01 


19 


37 


99 


500 


0.10 


10.03 


77 


76 


89 


100 


0.15 


15.04 


78 


77 


32 


75 


0.20 


20.05 


81 


78 


15 


64 


0.25 


25.06 


82 


79 


11 


57 


0.30 


30.08 


83 


79 


8 


53 


10 


0.05 


2.51 


7 


-9 


28 


451 


0.10 


5.03 


30 


74 


194 


205 


0.15 


7.54 


77 


76 


50 


99 


0.20 


10.05 


79 


77 


19 


80 


0.25 


12.56 


80 


78 


13 


69 


0.30 


15.08 


80 


78 


10 


62 



TABLE II 

Comparison of ADMiRA and SVT: no noise, n = m = 1000. 



where Cr.maxl^) and crr.min(-4) are defined by 

11X11^ = 1, rank(X)^r}, 
||X1|^ = 1, rank(X) r}, 

respectively. As r increases, the feasible sets of both problems increase and hence (Tr,max(-4) and (Tr.min(-4) are nondecreasing 
and nonincreasing, respectively. Therefore, (l22T i implies that 5r{A) is nondecreasing in r. 

B. Proof of Proposition 15.41 

Let d = dim(span(^')) and let $ = {4>j}'j=i be an orthonormal basis of span(^'). Then is an isometry that satisfies 

Since Ta,nk{C^a) < |*| < r and \\C^a\\p = \\a\\^ for all a G C^, by the R-RIP 

\\AUa\\^ < Vl + Sr{A) \\Ua\\p = + Sr{A) \\a\\^ , Va G C*. (23) 

This implies that the operator norm of AC<i, is bounded from above by -^/l + Sr{A). Since the adjoint operator has 
the same operator norm, 

\\U[AU]*b\\p = \\[AC^]*b\\^ ^ ^l + 6riA)\\b\\^, ybeCP. (24) 
Then © follows from ^ with [A£^]* = C<s>£lA* = T'vi-yl*. 

C. Proof of Proposition 15.51 

Let Y = P^A*AX. Then rank(y) |*| < r. By R-RIP, 

\{AX,AY)cp\' ^ \\AX\\l\\AY\\l < (1 + . 

Therefore 

{AX,AY)cp (^X,yl7'vt^MX)cp = (7'*yl*ylX,7'*^*^X)c™x„ = s: (1 + 5r{A))\\X\\p\\V^A*AX\\p . 

' Note that $ is not necessarily a set of atoms in O. 



c^r,max(-4) = max{|l^X|l. 



and 



CTr-.minM) = min{||^X||2 
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D. Proof of Proposition 15.61 

Since P and V^s/ are commuting projection operators, PVqi is a projection operator onto S = Tl{P)r(R.{V^) . Let d = dim(5) 
and let $ = {4>k]k=i ^ C™^" be an orthonormal basis of S, then C<s> is an isometry that satisfies PVqi = Since 
S C 7^(P^,), rank(/:$a) s$ |*| r for all a £ C''. Therefore, by R-RIP, 



y/T~SJA)\\a\\^ = s/l-Sr{A)\\C^a\\p \\ACi,a\\.2, Va G C^, 
where the first equality holds since is an isometry. By the relationship between AC<s> and £^A*AC,s>, it follow that 

(1 - dr{A)) \\a\\^ ^ \\ClA*AUa\\^ , Va G 
For each X e C™^", there exists a G such that PVq,X = C^a. 

\\VV^A*AVV^X\\p = \\C^ClA*AC^a\\p = \\C%A*AL^a\\^ 

^ (1 - 5M)) \\a\\^ = (1 - 5M)) = (1 - -^.l^)) . 

E. Proof of Proposition 15.71 

Assume that ||^||^ = H^H^ ~ 1- Let a e C be a constant of unit modulus, i.e. \a\ = 1. By the subadditivity of the rank, 
rank(X + aY) ^ r. By the orthogonality of X and Y, \\X + aYW], = \\X\\l + \a\ ||r||^ = 2. Therefore 

2(1 - Sr{A)) s^\\AX + aAY\\l s$ 2(1 + SriA)). 
In particular, the inequality holds for a ~ ±l,±i where i = V— 1. By the parallelogram identity, 

\{AX,AY)cp\^ = ^ WAX + AY\\l-\\AX -AY\\l + i\\AX + iAY\\l-i\\AX -iAY\\l'^ 



_ 1 
16 

< 2[6r{A)] 



2 1 2 

AX + AY\\l -WAX - AYWl + — + UFHa - - iAYW" 

2 



Proof of Corollary 15.91 

For an arbitrary matrix Y E c^x"^ 

{r^x,r^r^Y)c^.^ = (x,7'*7'^r)c— = (x,7'^7'*r)c™x„ = {v^x,r^r^Y)or..,. =o 

and 

rank(-p*X + of-p^-Px^) < rank(7'*(X + qP^F)) < I^K r 
for all a e C. Therefore Proposition 15.71 implies 

{Ar^,X,AV^r^Y)cp ^ V2Sr{A) WVm,XWp \\V^,V^Y\\p. 
Since Y was arbitrary, we can take Y = A*A'P<siX. Then 

{AV^X,AV^V^Y)cp = {r^V^A*Ar^X,r4,r^A*AV^X)c^xr. 
= \\r-^V<s,A*AV4,X\\l, 

V2Sr{A) Wv<s,xWp \\rirq,A*Ar^,x\\p . 

G. Proof of Proposition 15. 771 

We modify the proof the analogous result for the vector case in |[T3l for our proposition. 
For c O, the unit-ball in the subspace spanned by is defined by 

Bf^{Xe C™^" : X e span(*), < 1}. 

Define the convex body 

S ^ conv J y Bj 

where coiiv{G'} denotes the convex hull of set G. By the assumption, the operator norm satisfies 

Mlls-.2 =maxpX||2 < V^ + Sr{A). 
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Define the second convex body 

and consider the operator norm 

The claim of the proposition is equivalent to 



A-^ Xe IIXIL + ^ 11X11 <1 



1-^11x^2 = max P^ll 2' 



It suffices to show that K C S. Let X be an element in K. Consider the singular value decomposition of X, 

rank(X) 



k=l 



with (Tfe+i ^ CTfc. Let Ufe = if fc > rank(X) and J = [rank(X)/r] — 1, where [c] is the smallest integer equal to or greater 
than c. Then we have the following decomposition 

J r{j + l) J 
3=0 k=rj+l j=0 



where 



For each j E {1, . . . , J}, 



k—rj-\-l 



rU + l) 



and Y-i 



CTkUkvjf. 



k—rj-\'l 



" ' ' 1 



Therefore 



> k—rj + l 



.7 rj 



fe=rO-l) + l 



rank(X) 



- -J Y 

E^^-^TTrE E ^^^ = 77^ E 



j = l /c=r(j-l) + l 



A;=l 



From the definition of cq, it follows that co ^ H-'^Hp- Since X e A', we note 



1 

^c, < 11^11^ + ^11X11, 



Also note that K, e for all j = 0, . . . , J since rank(K,) ^ r and HljH^ = 1 by construction. Therefore X is the convex 
combination of the elements in S. Since 5 is a convex hull, X <E S. 



H. Proof of Lemma \6.2\ 

Let $ = atoms(Xo - X). Since |$| < rank(Xo) + rank(X) ^ 2r, it follows by the selection rule of ^' that 

V^A*{h-AX) V^'A*{b~-AX) 

F 

Let T C O be a set of atoms that spans span(<i>) n span(^''). Then Vr and 7^$ commute, 

VtV^ = V^Vt and V^V^ = V-i,Vk^ 

as do Vr and P^,/, 

VrV^,' =Vq,'VT and "Px-Pvi,' = 



(25) 



(26) 



(27) 



From the commutativity, we note that VyV<s> and VyVx^' are projection operators. Furthermore, each of the projection operators 
'P<s> and V^f' can be decomposed as the sum of two mutually orthogonal projection operators: 



(28) 
(29) 
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Applying ( |28l ) and ( |29] l to ( |25] l. invoking the Pythagorean theorem, and removing the common term containing gives 



V^V.i,A*{h- AX) 



V^V^„A*{b- AX) 



(30) 



First, we derive an upper bound on the right hand side of inequality ( l30l l. 

r^V^^A*{b- AX) 



V^V^.A* [A{X - X 
V^V^'A*A{X - X 



\V^V^'A*v\ 



(31) 



Using Proposition 15.41 and l^*'] ^ 2r, the second term of (ISTT i is bounded by 



\\V^V^'A*v\\p ^ \\V^,A*v\\p ^ Vi + M^lkll2 ■ 
The first term of dSTT i is further bounded by 



r^V^,'A*A{Xo - X) 



V^V^,'A*A{Vt + V^V^)iXo - X) 
V^V^'A*Arr{Xa- X 



V^V^-A*AV^'VT{Xa - X) 



V^V^'A*AViV^iXo - X) 
r^'A*AV^ViiXa-X) 



^ V2S2riA) Vt{Xo-X) +il+S2r{A)) V^{Xo~X) 

F 

where the third inequality follows from Corollarv 15.91 with V^Vx^i = V^'V^ and VyVriXa — X) = and Proposition 15.51 
with I*' I ^ 2r and TankiV^V-^ (Xq - X)) 2r. 

Combining the previous results, we have the following upper bound on the right hand side of inequality ( |30] |. 



r^r^'A*ib-Ax) 



^ V2S2r{A) Vt{Xo~X) +{l + d2r{A)) Vi{Xo-X) 

F 



+v/T+M^||Hl2. 

Next, we derive a lower bound on the left hand side of inequality (|30] |. 

V^V^A*{b-AX) 



(32) 



V^V^A* [A{Xa-X) 



V^V^A*A{Xa -X) - \\V^V^A*v\ 

F 



(33) 



Using Proposition 15.41 the second term of ( [33] l is further bounded by 

- \\V^V^A*v\\p ^ - \\V^A*v\\p > -^l + 52r{A)M^ . 
The first term of ( l33T l is further bounded by 



P-^V,i>A*AiXo - X] 



V^r^A*A{Vr + r^V^){Xo - X] 



r^V^A*AV^V^{Xo - X] 
V-^V.i.A*AV.i,v4{Xo - X] 



V.i.V^A*AVT{Xo- X) 



V^V^A*AV^Vt{Xo ' X) 



^ (1 


-hr{A)) 


V^{Xo- 


-X) 


F 


- V252r{A) 


Vt{Xo- 


-X) 


> (1 


-S2M)) 


Vi^,{Xo 


-X) 


F 


-V252r{A) 


Vr{Xo 


-X) 



where the inequality ( l34l i follows from Proposition 15.61 with V^Vii, = V<i,V^ and |$| ^ 2r and Corollary 
inequality ([35]) follows from the fact that {^')''^ C (T)^. 

Combining the previous results, we have the following lower bound on the left hand side of inequality (|30] | 



(34) 
(35) 
and the last 



V^V^A*{h- AX) 



^ {l-S2r{A)) V^,{X„-X) -V2S2r{A) Vr{Xo-X) 

F F 
-Vl + 62riA)M^^. 



(36) 
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Combining (O, ([32]i and yields 

1 + S2r{A) 



1 - S2r{A) 
1 



F 1 - 52r{A) 
AV252r{A){l + 52r{A)) 



Vt{Xo-X) 



Wl + S2riA) 
l-S2r{A) 



1 - ^(1 + «2,M))2 + SS2M)' ""' l-'S2r(^) 

where the second inequality is obtained by maximizing over Vr with the constraint 

|2 



2 ' 



r^iXo-X) + VriXo-X 



= Xn-X 



Substituting 62,- (A) ^ S^riA) ^ 0.04 gives the constants in the final inequality. 



/. Proof of Lemma 16.31 

Since $ C V^X = implies V~X = and hence 



VHX,-X) V^,{Xo-X) 



where the inequality holds since ^P' C 5* implies (^')^ C (^'')^ 



J. Proof of Lemma \6.4\ 

Assume that '5 is a linearly independent set of atoms in O. Otherwise, we can take as a maximal linearly independent 
subset of 5*. 

The minimizer in ( fT6b is given by 

X = £5 [AC^] h = C^ [AC^] ^ {AX,, + v). 

By the triangle inequality, 

\\X^-X\\f ^ \\Xo- C^[AC^]^ AX,,\\p + \\C^[AC^]K\\f 



rank<4r 



rank^Sr 



^l-5ir{A) 

where the last inequality follows from the R-RIP of A. 
The first term in (l38T l has the following upper bound 



a(xo-c^ [AC^]^ AXo 



2 ^l-5ir{A) 



AC^[AC^]K 



a(xo-c^ [ac^^'ax. 



'Ptz{ac^)AXo 



AXo-AC^ [AC^]^AX, 

T'iiAc,)A{V^+Vl)Xo\ ^ = V^^^c,)A{C^Cl+V^)Xo 
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rank<4r 



^l+5ir{A) V^Xo 

where rank(7'~^Xo) ^ 4r holds by the subadditivity of the rank in the following way: 

rank(7'^Xo) = rank(Xo - V-^X^) rank(Xo) + r&nk{V;^Xn) r + |*| < 4r. 
The second term in (|38] l is bounded by 

Finally, combining ( [38] l. ( [39] l. and (|40] i yields 



\Xn-X\\p < 



l + 5ir[A) 



Applying 5-ir{A) ^ ^4r(-4) ^ 0.04 completes the proof. 



(37) 
(38) 



(39) 



(40) 
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K. Proof of Lemma \6.5 



\\Xo - Xr\\F ^ 11^0 -X\\f + \\X - X4f 2|1Xo - X\\f, 
where the second inequality holds by the definition of the best rank-r approximation. 



L. Proof of Lemma \7.4\ 

In the fc-th iteration, the generalized correlation maximization rule chooses 'tj^. Let '^k = *I'fc-i U ^'^ in the /c-th iteration. 
Since '^k is chosen as a subset of '^k. 



Vfc e Z+. 



Then Lemma 16.41 and Lemma 16.51 imply 

ll^o-^feilF ^ 2 -(1.04 
2.08 



im\\v\\ 



F 

2.04 Ikll 



for all k£Z+. 



V- Xq < 30 \\i'\\2 for some k, then ( |4TI ) implies (T% . The first possibility has been shown. 



If 

Otherwise, we need to show that ( |20l ) and ( |2T] ) hold. 



Assume that 



^ 30 for some k e Z^. Then dTTT i implies ( l20b . Furthermore, we have 



ll^o-^fellF^ V^^{Xo-Xk) 
Therefore Theorem 16. 1 1 ensures that ( |2T] ) holds. 



> 30||i/|| 



(41) 



M. Proof of Lemma 17.51 

Define J be the set of indices for nonempty atomic bands 

Claim 1 Fix an index j E J. If 

\\Xo-X,\\F^2-^^+'y^\\Xo\\p 

for some fc, then 

B-i C $j = atoms(Xi), Vi ^ fc. 



(42) 



(43) 



Proof: (Claim 1) First, we show that Bj C "i/k- Assume that Bj '^k- Then there exists 4' G Bj such that ^ ^'fc. This 



implies 

ll^o-^fcllF^ V.4,{Xo-Xk) =\\V^Xo\\p>2-^^+^'>/^\\Xo\\p, 

F 

which is a contradiction. From the assumption, ( |2TI ) ensures that 

||Xo - 1,11;- i:\\X^-Xk\\F^ 2-(j"+i)/2 ii^^ii^ ^ v£ ^ A:. 



(44) 



(45) 



Then Claim 1 follows. ■ 
Equation ( |43] | implies that atomic band _Bj has been already identified in the fc-th iteration. From the condition of the atomic 
band identification in ( |42] |. it follows that the identification of Bj implies the identification of Bg for all £ ^ j. 
Claim 2 Assume that B( has been already identified for all i < j. Let /3 ~ (|). After at most 



log 



2.15 lir, 



Jl\F 



(46) 



2-0-+i)/2 \\X\\p 

more iterations, Bj is identified. 

Proof: (Claim 2) We start with the fc-th iteration. Since Bi has been already identified for aU £ < j, Bi is a subset of 
^'fe for all £ < j. In other words, ^g^j span(i?^) C span(\I'fe) and hence 



^span(_B£) = span($) n ^spaii(B£) D span(4>) n span($fc)-'-, 
e^j \e<j 



(47) 
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where $ = atoms(Xo). Since e span(<f>), V- Xq is the projection of Xq onto span($) n span(5'fe)-'- and therefore 



(48) 



where the inequahty follows from (l47T l. Also note that by assumption ( l20l i holds. Combining ( |20] | and ( l48T l, it follows that 



^o-^fe||F < 2.15 



2.15 !|y 



3 IIF ■ 



Now, Bj is identified in the (fc + £)-th iteration if 

\\Xo-Xk+4F^2-^^+'y^\\Xo\\p. 

It is easily verified that £ given in ( |46] | satisfies ( |49] l. 

The total number of iterations required to identify Bj for all j e J is at most 



(49) 



K=Y^ log/3 



2.15 



2W+i)/2||r,|| 



For each k ^ L'^'*]' ^e have atoms(Xfc) = atoms(Xo). It remains to bound fc* in terms of the profile t — profilc(Xo). First, 
note that t = \ J\. Using Jensen's inequality we have 



■ 2(^-+^)/^l|y,ll^ 

ll^olL 



20-+i)/2||y^.| 



^ exp I i^ln ^1 + 2.15' 



1/2 



(50) 



Recall the bound on in ( fTSl l. We use Jensen's inequality again and simplify the result. 



1 / 2^+^\\Y,\\l 



1/2 



(51) 



Combining dSOl l and dSTI ). we have 



2.15 ■ 



20+i)/2||y^ 



\Xo\ 



1 + 4.3V>7^. 



Taking the logarithm, multiplying by t, and dividing both sides by In f3, we have 

K Hog^(l +4.3Vf7^). 



A^. Proof of Theorem I7.il 

Let X = [tlog^(l + A.iy/rjiy]. Suppose that (fT9] l never holds during the first K iterations. Lemma 1741 then implies that 
both ( l20l ) and (l2Tl i hold for the first K iterations. By Lemma |731 all atoms in atoms(Xo) are identified in the A'-th iteration, 
i.e., ^'fc = atoms(Xo). Since C atoms(Xo) is a subset of '^k and hence V~ Xq = 0. Lemma 16741 and Lemma 1631 
then imply 

\\X(,-Xk\\f ^ 2 - (i.04 

This contradicts the assumption that ( lT9l ) never holds during the first K iterations. Therefore, there exists k ^ K where ^1% 
holds. Repeated application of Theorem 16. II gives 

\\Xo-Xk+6\\f < 15|k||2. 



^0 



1.021 



2.041 
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O. Proof of Theorem [Z< 

Let G C™^" be an arbitrary matrix and let t = profile(Xo.r)- By Lemma l676l we can rewrite the measurement as 
h — AXa.r + V. Theorem 17 . 3 1 states that after at most 



iteration, the approximation error satisfies 
Hence 



nog4/3(l + 4.3v/rA) + 6 

||^0,r-^||F < 15IFIU. 



sc; 15||i?||2 + !|Xo-Xo,,.||^ 

16.3 ||Xo - Xo,.||j, + ||Xo - Xo,,||, + 15 \\v\\^ 



< 17e 

where the third inequahty follows from Lemma 
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